12 research outputs found

    Segmentation and semantic labelling of RGBD data with convolutional neural networks and surface fitting

    Get PDF
    We present an approach for segmentation and semantic labelling of RGBD data exploiting together geometrical cues and deep learning techniques. An initial over-segmentation is performed using spectral clustering and a set of non-uniform rational B-spline surfaces is fitted on the extracted segments. Then a convolutional neural network (CNN) receives in input colour and geometry data together with surface fitting parameters. The network is made of nine convolutional stages followed by a softmax classifier and produces a vector of descriptors for each sample. In the next step, an iterative merging algorithm recombines the output of the over-segmentation into larger regions matching the various elements of the scene. The couples of adjacent segments with higher similarity according to the CNN features are candidate to be merged and the surface fitting accuracy is used to detect which couples of segments belong to the same surface. Finally, a set of labelled segments is obtained by combining the segmentation output with the descriptors from the CNN. Experimental results show how the proposed approach outperforms state-of-the-art methods and provides an accurate segmentation and labelling

    Stereo and ToF Data Fusion by Learning from Synthetic Data

    Get PDF
    Time-of-Flight (ToF) sensors and stereo vision systems are both capable of acquiring depth information but they have complementary characteristics and issues. A more accurate representation of the scene geometry can be obtained by fusing the two depth sources. In this paper we present a novel framework for data fusion where the contribution of the two depth sources is controlled by confidence measures that are jointly estimated using a Convolutional Neural Network. The two depth sources are fused enforcing the local consistency of depth data, taking into account the estimated confidence information. The deep network is trained using a synthetic dataset and we show how the classifier is able to generalize to different data, obtaining reliable estimations not only on synthetic data but also on real world scenes. Experimental results show that the proposed approach increases the accuracy of the depth estimation on both synthetic and real data and that it is able to outperform state-of-the-art methods

    Deep learning for scene understanding with color and depth data

    Get PDF
    Significant advancements have been made in the recent years concerning both data acquisition and processing hardware, as well as optimization and machine learning techniques. On one hand, the introduction of depth sensors in the consumer market has made possible the acquisition of 3D data at a very low cost, allowing to overcome many of the limitations and ambiguities that typically affect computer vision applications based on color information. At the same time, computationally faster GPUs have allowed researchers to perform time-consuming experimentations even on big data. On the other hand, the development of effective machine learning algorithms, including deep learning techniques, has given a highly performing tool to exploit the enormous amount of data nowadays at hand. Under the light of such encouraging premises, three classical computer vision problems have been selected and novel approaches for their solution have been proposed in this work that both leverage the output of a deep Convolutional Neural Network (ConvNet) as well jointly exploit color and depth data to achieve competing results. In particular, a novel semantic segmentation scheme for color and depth data is presented that uses the features extracted from a ConvNet together with geometric cues. A method for 3D shape classification is also proposed that uses a deep ConvNet fed with specific 3D data representations. Finally, a ConvNet for ToF and stereo confidence estimation has been employed underneath a ToF-stereo fusion algorithm thus avoiding to rely on complex yet inaccurate noise models for the confidence estimation task

    An integer programming approach to hand pose estimation using depth data

    Get PDF
    This thesis introduces a novel approach to tackle the problem of estimating the pose of an hand from a single depth frame. We will come up with a scheme for finger and palm regions extraction, followed by a Mixed Integer Programming formulation to complete the process of finger segmentation. A key idea is a novel method which is able to rearrange the hand point cloud, through a simple transformation, in a way that better fits for recognizing its structur

    Deep learning for scene understanding with color and depth data

    Get PDF
    Significant advancements have been made in the recent years concerning both data acquisition and processing hardware, as well as optimization and machine learning techniques. On one hand, the introduction of depth sensors in the consumer market has made possible the acquisition of 3D data at a very low cost, allowing to overcome many of the limitations and ambiguities that typically affect computer vision applications based on color information. At the same time, computationally faster GPUs have allowed researchers to perform time-consuming experimentations even on big data. On the other hand, the development of effective machine learning algorithms, including deep learning techniques, has given a highly performing tool to exploit the enormous amount of data nowadays at hand. Under the light of such encouraging premises, three classical computer vision problems have been selected and novel approaches for their solution have been proposed in this work that both leverage the output of a deep Convolutional Neural Network (ConvNet) as well jointly exploit color and depth data to achieve competing results. In particular, a novel semantic segmentation scheme for color and depth data is presented that uses the features extracted from a ConvNet together with geometric cues. A method for 3D shape classification is also proposed that uses a deep ConvNet fed with specific 3D data representations. Finally, a ConvNet for ToF and stereo confidence estimation has been employed underneath a ToF-stereo fusion algorithm thus avoiding to rely on complex yet inaccurate noise models for the confidence estimation task.Negli ultimi anni sono stati raggiunti notevoli progressi sia per quanto concerne l'acquisizione di dati sia per quanto riguarda la strumentazione e gli algoritmi necessari per processarli. Da un lato, l'introduzione di sensori di profondità nel mercato del grande consumo ha reso possibile l'acquisizione di dati tridimensionali ad un costo irrisorio, permettendo così di superare le limitazioni cui sono tipicamente soggette svariate applicazioni basate solamente sull'elaborazione del colore. Al tempo stesso, processori grafici sempre più performanti hanno consentito l'estensione della ricerca ad algoritmi computazionalmente onerosi e la loro applicazione a grandi moli di dati. Dall'altro lato, lo sviluppo di algoritmi sempre più efficaci per l'apprendimento automatico, ivi incluse tecniche di apprendimento profondo, ha permesso di sfruttare l'enorme quantità di dati oggi a disposizione. Alla luce di queste premesse, vengono presentati in questa tesi tre tipici problemi nell'ambito della visione computazionale proponendo altrettanti approcci per una loro soluzione in grado di sfruttare sia l'utilizzo di reti neurali convoluzionali sia l'informazione congiunta convogliata da dati di colore e profondità. In particolare, viene presentato un approccio per la segmentazione semantica di immagini colore/profondità che utilizza sia l'informazione estratta con l'aiuto di una rete neurale convoluzionale sia l'informazione geometrica ricavata attraverso algoritmi più tradizionali. Viene descritto un metodo per la classificazione di forme tridimensionali basato anch'esso sull'utilizzo di una rete neurale convoluzionale operante su particolari rappresentazioni dei dati 3D a disposizione. Infine, viene proposto l'utilizzo dei una rete convoluzionale per stimare la confidenza associata a dati di profondità rispettivamente raccolti con un sensore ToF ed un sistema stereo al fine di guidare con successo la loro fusione senza impiegare, per lo stesso scopo, complicati modelli di rumore

    Deep Learning For 3D Shape Classification from Multiple Depth Maps

    No full text
    This paper proposes a novel approach for the classification of 3D shapes exploiting deep learning techniques. The proposed algorithm starts by constructing a set of depth maps by rendering the input 3D shape from different viewpoints. Then the depth maps are fed to a multi-branch Convolutional Neural Network. Each branch of the network takes in input one of the depth maps and produces a classification vector by using 5 convolutional layers of progressively reduced resolution. The various classification vectors are finally fed to a linear classifier that combines the outputs of the various branches and produces the final classification. Experimental results on the Princeton ModelNet database show how the proposed approach allows to obtain a high classification accuracy and outperforms several state-of-the-art approaches

    Deep learning for 3D shape classification based on volumetric density and surface approximation clues

    No full text
    This paper proposes a novel approach for the classification of 3D shapes exploiting surface and volumetric clues inside a deep learning framework. The proposed algorithm uses three different data representations. The first is a set of depth maps obtained by rendering the 3D object. The second is a novel volumetric representation obtained by counting the number of filled voxels along each direction. Finally NURBS surfaces are fitted over the 3D object and surface curvature parameters are selected as the third representation. All the three data representations are fed to a multi-branch Convolutional Neural Network. Each branch processes a different data source and produces a feature vector by using convolutional layers of progressively reduced resolution. The extracted feature vectors are fed to a linear classifier that combines the outputs in order to get the final predictions. Experimental results on the ModelNet dataset show that the proposed approach is able to obtain a state-of-the-art performance

    Exploiting Silhouette Descriptors and Synthetic Data for Hand Gesture Recognition

    No full text
    This paper proposes a novel real-time hand gesture recognition scheme explicitly targeted to depth data. The hand silhouette is firstly extracted from the acquired data and then two ad-hoc feature sets are computed from this representation. The first is based on the local curvature of the hand contour, while the second represents the thickness of the hand region close to each contour point using a distance transform. The two feature sets are rearranged in a three dimensional data structure representing the values of the two features at each contour location and then this representation is fed into a multi-class Support Vector Machine. The classifier is trained on a synthetic dataset generated with an ad-hoc rendering system developed for the purposes of this work. This approach allows a fast construction of the training set without the need of manually acquiring large training datasets. Experimental results on real data show how the approach is able to achieve a 90% accuracy on a typical hand gesture recognition dataset with very limited computational resources

    3D hand shape analysis for palm and fingers identification

    No full text
    This paper proposes a novel scheme for the extraction and identification of the palm and the fingers from a single depth map. The hand is firstly segmented from the rest of the scene, then it is divided into palm and fingers regions. For this task we employed a novel scheme that exploits the idea that fingers have a tubular shape while the palm is more planar. Following this rationale we applied a contraction guided by the normals in order to reduce the fingers into thinner structures that can be identified by analyzing the changes in the point density. Density-based clustering is then applied and finally a linear programming based approach is employed to identify the various fingers. Experimental results prove the effectiveness of the proposed approach even in complex situations and in presence of inter-occlusions between the various fingers

    Deep Learning for Confidence Information in Stereo and ToF Data Fusion

    No full text
    This paper proposes a novel framework for the fusion of depth data produced by a Time-of-Flight (ToF) camera and a stereo vision system. The key problem of balancing between the two sources of information is solved by extracting confidence maps for both sources using deep learning. We introduce a novel synthetic dataset accurately representing the data acquired by the proposed setup and use it to train a Convolutional Neural Network architecture. The machine learning framework estimates the reliability of both data sources at each pixel location. The two depth fields are finally fused enforcing the local consistency of depth data taking into account the confidence information. Experimental results show that the proposed approach increases the accuracy of the depth estimation
    corecore